Setup

## **** __Utilized Cores__ **** = 2
## $subsetGenes
## [1] "protein_coding"
## 
## $subsetCells
## [1] 500
## 
## $resolution
## [1] 0.6
## 
## $resultsPath
## [1] "./"
## 
## $nCores
## [1] 2
## 
## $perplexity
## [1] 30

./

Load Libraries

NOTE:: When Seurat updated from v2 to v3, they made a lot of problematic changes to the function names and arguments that cause a ton of errors.

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
##  [1] parallel  stats4    grid      stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] monocle_2.99.3              L1Graph_0.1.1              
##  [3] lpSolveAPI_5.5.2.0-17.6     DDRTree_0.1.5              
##  [5] irlba_2.3.3                 igraph_1.2.4.2             
##  [7] Matrix_1.2-18               DESeq2_1.24.0              
##  [9] SummarizedExperiment_1.14.1 DelayedArray_0.10.0        
## [11] BiocParallel_1.18.1         matrixStats_0.55.0         
## [13] Biobase_2.44.0              GenomicRanges_1.36.1       
## [15] GenomeInfoDb_1.20.0         IRanges_2.18.3             
## [17] S4Vectors_0.22.1            BiocGenerics_0.30.0        
## [19] biomaRt_2.40.5              EnhancedVolcano_1.2.0      
## [21] ggrepel_0.8.1               ComplexHeatmap_2.0.0       
## [23] DT_0.12                     shiny_1.4.0                
## [25] reshape2_1.4.3              viridis_0.5.1              
## [27] viridisLite_0.3.0           plotly_4.9.2               
## [29] ggplot2_3.3.0               knitr_1.28                 
## [31] gridExtra_2.3               dplyr_0.8.4                
## [33] Seurat_3.1.4               
## 
## loaded via a namespace (and not attached):
##   [1] coda_0.19-3             tidyr_1.0.2             acepack_1.4.1          
##   [4] bit64_0.9-7             multcomp_1.4-12         data.table_1.12.8      
##   [7] rpart_4.1-15            RCurl_1.98-1.1          doParallel_1.0.15      
##  [10] metap_1.3               cowplot_1.0.0           TH.data_1.0-10         
##  [13] RSQLite_2.2.0           RANN_2.6.1              VGAM_1.1-2             
##  [16] future_1.16.0           bit_1.1-15.2            mutoss_0.1-12          
##  [19] webshot_0.5.2           httpuv_1.5.2            assertthat_0.2.1       
##  [22] xfun_0.12               hms_0.5.3               evaluate_0.14          
##  [25] promises_1.1.0          progress_1.2.2          caTools_1.18.0         
##  [28] DBI_1.1.0               geneplotter_1.62.0      htmlwidgets_1.5.1      
##  [31] sparsesvd_0.2           spdep_1.1-3             purrr_0.3.3            
##  [34] crosstalk_1.0.0         backports_1.1.5         annotate_1.62.0        
##  [37] gbRd_0.4-11             deldir_0.1-25           RcppParallel_4.4.4     
##  [40] vctrs_0.2.3             ROCR_1.0-7              withr_2.1.2            
##  [43] checkmate_2.0.0         sctransform_0.2.1       prettyunits_1.1.1      
##  [46] mnormt_1.5-6            cluster_2.1.0           ape_5.3                
##  [49] lazyeval_0.2.2          crayon_1.3.4            genefilter_1.66.0      
##  [52] units_0.6-5             glmnet_3.0-2            pkgconfig_2.0.3        
##  [55] slam_0.1-47             nlme_3.1-145            nnet_7.3-13            
##  [58] rlang_0.4.5             globals_0.12.5          lifecycle_0.2.0        
##  [61] miniUI_0.1.1.1          sandwich_2.5-1          rsvd_1.0.3             
##  [64] lmtest_0.9-37           raster_3.0-12           boot_1.3-24            
##  [67] zoo_1.8-7               base64enc_0.1-3         ggridges_0.5.2         
##  [70] GlobalOptions_0.1.1     pheatmap_1.0.12         png_0.1-7              
##  [73] rjson_0.2.20            bitops_1.0-6            KernSmooth_2.23-16     
##  [76] blob_1.2.1              rgl_0.100.50            classInt_0.4-2         
##  [79] shape_1.4.4             stringr_1.4.0           manipulateWidget_0.10.1
##  [82] jpeg_0.1-8.1            scales_1.1.0            memoise_1.1.0          
##  [85] magrittr_1.5            plyr_1.8.6              ica_1.0-2              
##  [88] gplots_3.0.3            bibtex_0.4.2.2          gdata_2.18.0           
##  [91] zlibbioc_1.30.0         compiler_3.6.2          HSMMSingleCell_1.4.0   
##  [94] lsei_1.2-0              RColorBrewer_1.1-2      plotrix_3.7-7          
##  [97] clue_0.3-57             fitdistrplus_1.0-14     LearnBayes_2.15.1      
## [100] XVector_0.24.0          listenv_0.8.0           patchwork_1.0.0        
## [103] pbapply_1.4-2           htmlTable_1.13.3        Formula_1.2-3          
## [106] MASS_7.3-51.5           tidyselect_1.0.0        stringi_1.4.6          
## [109] densityClust_0.3        yaml_2.2.1              locfit_1.5-9.1         
## [112] latticeExtra_0.6-29     pbmcapply_1.5.0         tools_3.6.2            
## [115] future.apply_1.4.0      circlize_0.4.8          rstudioapi_0.11        
## [118] foreach_1.4.8           foreign_0.8-76          Rtsne_0.15             
## [121] digest_0.6.25           FNN_1.1.3               qlcMatrix_0.9.7        
## [124] Rcpp_1.0.3              later_1.0.0             RcppAnnoy_0.0.15       
## [127] httr_1.4.1              AnnotationDbi_1.46.1    sf_0.8-1               
## [130] npsurv_0.4-0            Rdpack_0.11-1           colorspace_1.4-1       
## [133] XML_3.99-0.3            reticulate_1.14         splines_3.6.2          
## [136] uwot_0.1.5              expm_0.999-4            sn_1.5-5               
## [139] sp_1.4-1                multtest_2.40.0         spData_0.3.3           
## [142] xtable_1.8-4            jsonlite_1.6.1          R6_2.4.1               
## [145] gmodels_2.18.1          TFisher_0.2.0           Hmisc_4.3-1            
## [148] pillar_1.4.3            htmltools_0.4.0         mime_0.9               
## [151] glue_1.3.1              fastmap_1.0.1           class_7.3-15           
## [154] codetools_0.2-16        tsne_0.1-3              mvtnorm_1.1-0          
## [157] lattice_0.20-40         tibble_2.1.3            numDeriv_2016.8-1.1    
## [160] leiden_0.3.3            gtools_3.8.1            survival_3.1-8         
## [163] limma_3.40.6            rmarkdown_2.1           docopt_0.6.1           
## [166] fastICA_1.2-2           munsell_0.5.0           e1071_1.7-3            
## [169] GetoptLong_0.1.8        GenomeInfoDbData_1.2.1  iterators_1.0.12       
## [172] gtable_0.3.0
## [1] "Seurat  3.1.4"

Load Data

Since the original Seurat object was made with an older version of Seurat, it has to be converted to the updated Seurat V3 object.

## Updating from v2.X to v3.X
## Validating object structure
## Updating object slots
## Ensuring keys are in the proper strucutre
## Ensuring feature names don't have underscores or pipes
## Object representation is consistent with the most current Seurat version
## An object of class Seurat 
## 24914 features across 22113 samples within 1 assay 
## Active assay: RNA (24914 features)

Clean Metadata

Add Metadata

## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

An object of class Seurat 24914 features across 495 samples within 1 assay Active assay: RNA (24914 features)

Filter & Normalize Data

Subset Genes by Variance

Important!: * In ScaleData… + Specify do.par = F (unless you have parallel processing set up properly, this will cause your script to crash) + Specify num.cores = nCores (to use all available cores, determined by parallel::detectCores())

Regress out: number of unique transcripts (nUMI), % mitochondrial transcripts (percent.mito)

## Warning: The following arguments are not used: x.low.cutoff, x.high.cutoff,
## y.cutoff
## Total Genes: 14827
## Highly Variable Genes: 2000
## Regressing out nUMI, percent.mito
## Centering and scaling data matrix

Filtered Dimensions

## An object of class Seurat 
## 14827 features across 19144 samples within 1 assay 
## Active assay: RNA (14827 features)

Diagnostic Plots

You can mix genes and metadata variables in these plots just by listing them all in the features argument.

Dimensionality Reduction

PCA

ProjectPCA scores each gene in the dataset (including genes not included in the PCA) based on their correlation with the calculated components. Though we don’t use this further here, it can be used to identify markers that are strongly correlated with cellular heterogeneity, but may not have passed through variable gene selection. The results of the projected PCA can be explored by setting use.full=T in the functions above

  • Other Dim Reduction Methods in Seurat
    • RunCCA()
    • RunMultiCCA()
    • RunDiffusion()
    • RunPHATE()
    • RunICA()

Significant PCs

Determine statistically significant PCs for further analysis. NOTE: This process can take a long time for big datasets, comment out for expediency. More approximate techniques such as those implemented in PCElbowPlot() can be used to reduce computation time

UMAP

Additional UMAP arguments detailed here: https://umap-learn.readthedocs.io/en/latest/api.html#module-umap.umap_

NOTE: Specifying n.components = <large_number> take a little more time but helps clustering later on.

## Warning: The following arguments are not used: num_threads
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 20:06:16 UMAP embedding parameters a = 0.9922 b = 1.112
## 20:06:16 Read 19144 rows and found 10 numeric columns
## 20:06:16 Using Annoy for neighbor search, n_neighbors = 30
## 20:06:16 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 20:06:19 Writing NN index file to temp file /var/folders/sw/km9819713937hp640lj59ygw0000gn/T//RtmpP9OGec/file7aec5decc50e
## 20:06:19 Searching Annoy index using 1 thread, search_k = 3000
## 20:06:27 Annoy recall = 100%
## 20:06:27 Commencing smooth kNN distance calibration using 1 thread
## 20:06:29 Initializing from normalized Laplacian + noise
## 20:06:29 Commencing optimization for 200 epochs, with 741522 positive edges
## 20:06:47 Optimization finished

Find Cell Clusters

  • Seurat v3 applies a graph-based clustering approach, building upon initial strategies in (Macosko et al). Importantly, the distance metric which drives the clustering analysis (based on previously identified PCs) remains the same. However, our approach to partioning the cellular distance matrix into clusters has dramatically improved. Our approach was heavily inspired by recent manuscripts which applied graph-based clustering approaches to scRNA-seq data [SNN-Cliq, Xu and Su, Bioinformatics, 2015] and CyTOF data [PhenoGraph, Levine et al., Cell, 2015]. Briefly, these methods embed cells in a graph structure - for example a K-nearest neighbor (KNN) graph, with edges drawn between cells with similar feature expression patterns, and then attempt to partition this graph into highly interconnected ‘quasi-cliques’ or ‘communities’.

  • As in PhenoGraph, we first construct a KNN graph based on the euclidean distance in PCA space, and refine the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccard similarity). This step is performed using the FindNeighbors function, and takes as input the previously defined dimensionality of the dataset (first 10 PCs).

  • To cluster the cells, we next apply modularity optimization techniques such as the Louvain algorithm (default) or SLM [SLM, Blondel et al., Journal of Statistical Mechanics], to iteratively group cells together, with the goal of optimizing the standard modularity function. The FindClusters function implements this procedure, and contains a resolution parameter that sets the ‘granularity’ of the downstream clustering, with increased values leading to a greater number of clusters. We find that setting this parameter between 0.4-1.2 typically returns good results for single-cell datasets of around 3K cells. Optimal resolution often increases for larger datasets. The clusters can be found using the Idents function.

IMPORTANT: Only use the most variable genes for the graph construction step of clustering for much better clustering

## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 19144
## Number of edges: 510072
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9593
## Number of communities: 3
## Elapsed time: 1 seconds
## Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
## Please use `as_label()` or `as_name()` instead.
## This warning is displayed once per session.

Cluster Biomarkers

Seurat has several tests for differential expression which can be set with the test.use parameter (see the DE vignette for details). For example, the ROC test returns the ‘classification power’ for any individual marker (ranging from 0 - random, to 1 - perfect).

Shown here: Biomarkers of each cluster vs. all other clusters.

Biomarkers Data

All Biomarkers

## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2

Top Biomarkers

Cluster Biomarker: Volcano Plots

Uses the EnhancedVolcano library. Tutorial. .

##Construct the plot object
volcanoPlot <- function(DEG_df, 
                        caption="", 
                        topFC_labeled=Inf,
                        FC_cutoff=1, 
                        Q_cutoff=.05, 
                        show_plot=T){
  yMax  <- max(-log10(DEG_df$p_val_adj)) + max(-log10(DEG_df$p_val_adj))/3
  # IMPORTANT!# Must replace 0s with small numbers to avoid getting errors when taking the -log
  DEG_df[DEG_df$p_val_adj==0,"p_val_adj"] <- .Machine$double.xmin
  rownames(DEG_df) <- DEG_df$gene
 
  labeled_genes <- subset(DEG_df, p_val_adj<Q_cutoff | avg_logFC>=FC_cutoff)
  if(nrow(labeled_genes)>topFC_labeled){
     labeled_genes <- arrange(labeled_genes, p_val_adj, desc(avg_logFC))[1:topFC_labeled,]
  }
  
  xlimit <- max(abs(DEG_df$avg_logFC))*1.1
  
  # EnhancedVolcano library
  vol <- EnhancedVolcano::EnhancedVolcano(toptable = DEG_df,
                                   x="avg_logFC",
                                   y="p_val_adj",
                                   transcriptPointSize = 3,
                                   pCutoff = Q_cutoff, 
                                   FCcutoff = FC_cutoff,
                                   lab=rownames(DEG_df),
                                   cutoffLineCol = 'grey30',
                                   col = c("grey30", "forestgreen", "royalblue", "red2"), 
                                   legend = c("NA",
                                              paste("log2(FC) ≥", FC_cutoff),
                                              paste("q-value <",Q_cutoff),
                                              paste("q-value <",Q_cutoff,"& log2(FC) ≥", FC_cutoff)
                                              ),
                                   xlab = "log2(FC)",
                                   ylab = "-log10(q-value)",
                                   # legend = c("Bonferonni < .05",),
                                   # col = c("black","purple","turquoise"),
                                   title=caption,
                                   subtitle = "") + 
    xlim(c(-xlimit,xlimit)) + 
    labs(fill="DGE Group") +
    theme_bw()
  if(show_plot){print(vol)}
  return(vol)
  # # Custom volcano plot
  #   DEG_df$sig<-  ifelse( DEG_df$p_val_adj<0.05 & DEG_df$avg_logFC<1.5, "p_val_adj<0.05",
  #             ifelse( DEG_df$p_val_adj<0.05  & DEG_df$avg_logFC>1.5, "p_val_adj<0.05 & avg_logFC>1.5",
  #                 "p_val_adj>0.05"
  #         )) 
  #   DEG_df <- arrange(DEG_df, desc(sig))
  # vol <- ggplot(data=DEG_df, aes(x=avg_logFC, y= -log10(p_val_adj))) +
  #   geom_point(alpha=0.5, size=3, aes(col=sig)) +
  #   scale_color_manual(values=list("p_val_adj<0.05"="turquoise3",
  #                                  "p_val_adj<0.05 & avg_logFC>1.5"="purple",
  #                                  "p_val_adj>0.05" = "darkgray")) +
  #   theme(legend.position = "none") +
  #   xlab(expression(paste("Average ",log^{2},"(fold change)"))) +
  #   ylab(expression(paste(-log^{10},"(p-value)"))) +
  #   xlim(-2,2) + ylim(0, yMax) +
  #   ## ggrepl labels
  #   ggrepel::geom_text_repel(data=labeled_genes, 
  #                             aes(label=gene, x=avg_logFC, y= -log10(p_val_adj)),
  #                             color="black", alpha=.5,
  #                             segment.color="black", segment.alpha=.5
  #                   ) +
  #   # Lines
  #   geom_vline(xintercept= -1.5,lty=4, lwd=.3, alpha=.5) +
  #   geom_vline(xintercept= 1.5,lty=4, lwd=.3, alpha=.5) +
  #   geom_hline(yintercept= -log10(0.05),lty=4, lwd=.3, alpha=.5) +
  #   ggtitle(caption)
  # print(vol)
}

Cluster Volcano Plots

A “biomarker” is a set of genes that characterize a given cluster AND differntiate it from all other clusters.

Cluster 0 : Volcano

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Cluster 1 : Volcano

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Cluster 2 : Volcano

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Top Biomarker Plot

Biomarker enrichment with gprofiler

Cluster 0

[1] “Biomarker: S100A12” “Biomarker: S100A8” “Biomarker: S100A9” [4] “Biomarker: VCAN” “Biomarker: CD14”

Cluster 1

[1] “Biomarker: FCGR3A” “Biomarker: RHOC” “Biomarker: CDKN1C” [4] “Biomarker: IFITM3” “Biomarker: MS4A7” [1] “No significant GO terms at Bonferonni-corrected p-value < 0.05.”

Cluster 2

[1] “Biomarker: FCER1A” “Biomarker: CLEC10A” “Biomarker: HLA-DQA1” [4] “Biomarker: HLA-DQA2” “Biomarker: HLA-DQB1” [1] “No significant GO terms at Bonferonni-corrected p-value < 0.05.”

Biomarkers Ridgeplot

## Picking joint bandwidth of 0.222
## Picking joint bandwidth of 0.14
## Picking joint bandwidth of 0.134
## Picking joint bandwidth of 0.151
## Picking joint bandwidth of 0.135
## Picking joint bandwidth of 0.0824
## Picking joint bandwidth of 0.0743
## Picking joint bandwidth of 0.0677
## Picking joint bandwidth of 0.202
## Picking joint bandwidth of 0.0789
## Picking joint bandwidth of 0.0981
## Picking joint bandwidth of 0.0902
## Picking joint bandwidth of 0.159
## Picking joint bandwidth of 0.084
## Picking joint bandwidth of 0.165

Map Clusters to Known Biomarkers

  • Known Monocytes Biomarkers
    • Classical: CD14++ / CD16–
    • Intermediate: CD14++ / CD16+
    • Nonclassical: CD14+ / CD16++ (not captured in this data)

The following plots show the absolute expression of each biomarker, as opposed to avg_logFC which is dependent on the expression patterns of other cell types being compared.

Markers Dataframe

## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

Identify Cell Types (under construction)

Garnett

Garnett works with the Monocle library so you need to convert the Seurat object to CDS format (which Monocle uses) first. NOTE: You must install the version of Garnett that is compatible with your version of Monocle. E.g. devtools::install_github("cole-trapnell-lab/garnett", ref="monocle3")

# Convert Seurat object to CDS object
load(file.path(root,"Data/seurat_object_add_HTO_ids.Rdata"), )
mDAT <- monocle::importCDS(seurat.obj,  import_all = F) 
remove(seurat.obj)

# generate size factors for normalization later
mDAT <- estimateSizeFactors(mDAT)

# Get pre-trained PBMC classifer 
# Download from: https://cole-trapnell-lab.github.io/garnett/classifiers
classifier_file <- "./Data/hsPBMC_20191017.RDS"
if(!file.exists(classifier_file)){
  download.file(url="https://cole-trapnell-lab.github.io/garnett/classifiers/hsPBMC_20191017.RDS", 
              destfile = "./Data/hsPBMC_20191017.RDS")
}
hsPBMC <- readRDS("./Data/hsPBMC_20191017.RDS") 
# Get feature genes for each cell type
feature_genes <- get_feature_genes(hsPBMC,
                                   node = "root",
                                   db = org.Hs.eg.db, 
                                   convert_ids = T)
head(feature_genes)
library(org.Hs.eg.db)
mDAT <- classify_cells(cds = mDAT, 
                       classifier = hsPBMC,
                        db = org.Hs.eg.db,
                         cluster_extend = TRUE,
                         cds_gene_id_type = "SYMBOL")
head(pData(mDAT))
table(pData(mDAT)$cell_type)
table(pData(mDAT)$cluster_ext_type) 



# Run tSNE: Plot Clusters and Cell Types 
mDAT <- reduceDimension(mDAT, max_components = 3, reduction_method = "tSNE") 
commonGeoms <- labs(x="tSNE1",y="tSNE2")

plot_grid(nrow = 2,
  qplot(data = pData(mDAT), mDAT@reducedDimA[1,], mDAT@reducedDimA[2,], color = cell_type) + theme_bw() + commonGeoms,
  qplot(data = pData(mDAT), mDAT@reducedDimA[1,], mDAT@reducedDimA[2,], color = cluster_ext_type) + theme_bw() + commonGeoms
)


# Unsupervised Clustering
mDAT <- clusterCells(mDAT, num_clusters = 5)
pData(mDAT)
plot_cell_clusters(mDAT, 1, 2, color = "Cluster",  markers = c("CD14", "FCGR3A"))
plot_cell_clusters(mDAT, 1, 2, color = "Cluster") + facet_wrap(~dx)
plot_cell_clusters(mDAT, 1, 2, color = "Cluster") + facet_wrap(~mut)
plot_cell_clusters(mDAT, 1, 2, color = "Cluster") + facet_wrap(~cell_type) 

DAT <- Seurat::AddMetaData(DAT, pData(mDAT)[c("garnett_cluster","cell_type","cluster_ext_type","Cluster")])

Pseudo-time

WARNING: Very computationally expensive!

Known Biomarkers: Heatmaps

Cells Separated

markerDF <- get_markerDF(DAT, markerList, 
             meta_vars =c("barcode", "dx", "mut","ID","post_clustering", "percent.mito","nGene", "nUMI"))
Spectral <- grDevices::colorRampPalette(RColorBrewer::brewer.pal(length(unique(DAT@meta.data$mut)), "Spectral"))

# DAT <- DoKMeans(DAT, k.genes = 3) 
# KMeansHeatmap(DAT)

if (T==F){
  # Spectral <- heatmaply::Spectral(length(unique(DAT@meta.data$mut)))
  markerMelt <- reshape2::acast(markerDF, Cell~Gene, value.var="Expression", fun.aggregate = mean, drop = F, fill = 0) 
  heatmaply::heatmaply(markerMelt,  key.title="Expression",#plot_method= "ggplot",
        k_row = dim(markerMelt)[2], dendrogram = "row",
        showticklabels = c(T, F), xlab = "Known Markers", ylab = "Cells", column_text_angle = 45, 
        row_side_colors =  DAT@meta.data[,c("dx","mut", "cell_type")], row_side_palette = Spectral
        )  %>%  colorbar(tickfont = list(size = 12), titlefont = list(size = 14), which = 2)  %>% 
        colorbar(tickfont = list(size = 12), titlefont = list(size = 14), which = 1)
 }else{ 
  # markerDF_sub <-subset(markerDF, Gene==markerList[1])  
  # var_to_colors(markerDF_sub, "post_clustering")  
  # library(pheatmap)
  # pheatmap(markerMelt, annotation_row = markerDF_sub[c("dx","mut","cell_type")])
  # pheatmap(markerMelt, kmeans_k = NA, annotation_row = markerDF_sub[c("dx","mut","cell_type")],
  #         cluster_cols = F, cutree_rows = length(unique(markerDF$post_clustering)),  angle_col=45 )
  library(RColorBrewer) 
  var_to_colors <- function(markerDF, metaVar){
    colors <- brewer.pal(length(unique(markerDF[metaVar]) ), "Dark2")
     sample(colors, length(unique(markerDF[metaVar])), replace = TRUE, prob = NULL)
    # metaColors <- colors[ subset(markerDF, Gene==markerList[1])[metaVar][,1] %>% as.factor() ]
    return(metaColors)
  }  
  # library(GMD)
  # myCols = cbind(var_to_colors(markerDF, "dx"), var_to_colors(markerDF, "mut")) 
  # rlab=t(cbind(
  #   var_to_colors(markerDF, "post_clustering"),
  #   var_to_colors(markerDF, "dx")
  #   ))
  #   heatmap.2(marker.matrix, key.title="Expression",  col = viridis(300), trace="none",Colv = F, Rowv = F,
  #             labRow = F, xlab = "Biomarker", ylab="Cell", cexCol=1, RowSideColors = var_to_colors(markerDF, "post_clustering")
  #             )
  # heatmap.3(markerMelt, dendrogram = 'row', kr = length(unique(markerDF)), labRow = F, 
  #           xlab = "Biomarker", ylab = "Cell", RowSideColors = rlab, RowSideColorsSize=2 )
   
  
  
  markerDF <- markerDF %>%    
    mutate_at(vars(post_clustering, dx, mut, ID), as.factor) %>% 
    mutate(Cluster = post_clustering) %>%
    arrange(post_clustering) 
  # markerMelt <- reshape2::acast(markerDF, Cell~Gene, value.var="Expression", fun.aggregate = mean, drop = F, fill = 0) 
  markerMelt <- dcast(markerDF,  Cell + post_clustering + dx + mut + ID ~ Gene,
                      fun.aggregate = mean, value.var = "Expression") %>% arrange(post_clustering)
  marker.matrix <- markerMelt[markerList] %>%as.matrix()
  row.names(marker.matrix) <- markerMelt$Cell
  
  ha = HeatmapAnnotation(df = markerDF[c("dx","mut","ID","post_clustering")], which = "row") 
  
  ComplexHeatmap::Heatmap(marker.matrix, col=viridis(300), column_title = "Biomarker", row_title = "Cell",  
                          row_dend_reorder = F,show_row_names = F, show_column_dend = F,show_row_dend =T,
                          cluster_rows = T, column_title_side = "bottom",km = length(unique(markerMelt$post_clustering))) + ha
  print(ha)
} 
## A HeatmapAnnotation object with 4 annotations
##   name: heatmap_annotation_0 
##   position: row 
##   items: 38288 
##   width: 21.0543794105438mm 
##   height: 1npc 
##   this object is subsetable
##   29.5326666666667mm extension on the bottom 
## 
##             name annotation_type color_mapping width
##               dx discrete vector        random   5mm
##              mut discrete vector        random   5mm
##               ID discrete vector        random   5mm
##  post_clustering discrete vector        random   5mm

DGE: All Cells

DGE methods available in Seurat include:

  • “wilcox” : Identifies differentially expressed genes between two groups of cells using a Wilcoxon Rank Sum test (default)

  • “bimod” : Likelihood-ratio test for single cell gene expression, (McDavid et al., Bioinformatics, 2013)

  • “roc” : Identifies ‘markers’ of gene expression using ROC analysis. For each gene, evaluates (using AUC) a classifier built on that gene alone, to classify between two groups of cells. An AUC value of 1 means that expression values for this gene alone can perfectly classify the two groupings (i.e. Each of the cells in cells.1 exhibit a higher level than each of the cells in cells.2). An AUC value of 0 also means there is perfect classification, but in the other direction. A value of 0.5 implies that the gene has no predictive power to classify the two groups. Returns a ‘predictive power’ (abs(AUC-0.5) * 2) ranked matrix of putative differentially expressed genes.

  • “t” : Identify differentially expressed genes between two groups of cells using the Student’s t-test.

  • “negbinom” : Identifies differentially expressed genes between two groups of cells using a negative binomial generalized linear model. Use only for UMI-based datasets

  • “poisson” : Identifies differentially expressed genes between two groups of cells using a poisson generalized linear model. Use only for UMI-based datasets

  • “LR” : Uses a logistic regression framework to determine differentially expressed genes. Constructs a logistic regression model predicting group membership based on each feature individually and compares this to a null model with a likelihood ratio test.

  • “MAST” : Identifies differentially expressed genes between two groups of cells using a hurdle model tailored to scRNA-seq data. Utilizes the MAST package to run the DE testing.

  • “DESeq2” : Identifies differentially expressed genes between two groups of cells based on a model using DESeq2 which uses a negative binomial distribution (Love et al, Genome Biology, 2014).This test does not support pre-filtering of genes based on average difference (or percent detection rate) between cell groups. However, genes may be pre-filtered based on their minimum detection rate (min.pct) across both cell groups. To use this method, please install DESeq2, using the instructions at https://bioconductor.org/packages/release/bioc/html/DESeq2.html

PD vs. Controls

## Assuming data assay in position 1, with name et is log-transformed.
## 
## Done!
## Combining coefficients and standard errors
## Calculating log-fold changes
## Calculating likelihood ratio tests
## Refitting on reduced model...
## 
## Done!
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Saving DGE results ==> ./Results/Seurat/DGE.results_dx_clusterallClusters.tsv

LRRK vs. PD

## Assuming data assay in position 1, with name et is log-transformed.
## 
## Done!
## Combining coefficients and standard errors
## Calculating log-fold changes
## Calculating likelihood ratio tests
## Refitting on reduced model...
## 
## Done!
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Saving DGE results ==> ./Results/Seurat/DGE.results_mut_clusterallClusters.tsv

DGE: Within Clusters

Between Disease Groups

Cluster 0- PD vs. control

An object of class Seurat 14827 features across 15971 samples within 1 assay Active assay: RNA (14827 features) 3 dimensional reductions calculated: pca, tsne, umap

## Assuming data assay in position 1, with name et is log-transformed.
## 
## Done!
## Combining coefficients and standard errors
## Warning in melt(coefAndCI, as.is = TRUE): The melt generic in data.table has
## been passed a array and will attempt to redirect to the relevant reshape2
## method; please note that reshape2 is deprecated, and this redirection is now
## deprecated as well. To continue using melt methods from reshape2 while both
## libraries are attached, e.g. melt.list, you can prepend the namespace like
## reshape2::melt(coefAndCI). In the next version, this warning will become an
## error.
## Calculating log-fold changes
## Warning in melt(lfc): The melt generic in data.table has been passed a list
## and will attempt to redirect to the relevant reshape2 method; please note that
## reshape2 is deprecated, and this redirection is now deprecated as well. To
## continue using melt methods from reshape2 while both libraries are attached,
## e.g. melt.list, you can prepend the namespace like reshape2::melt(lfc). In the
## next version, this warning will become an error.
## Calculating likelihood ratio tests
## Refitting on reduced model...
## 
## Done!
## Warning in melt(llrt): The melt generic in data.table has been passed a list
## and will attempt to redirect to the relevant reshape2 method; please note that
## reshape2 is deprecated, and this redirection is now deprecated as well. To
## continue using melt methods from reshape2 while both libraries are attached,
## e.g. melt.list, you can prepend the namespace like reshape2::melt(llrt). In the
## next version, this warning will become an error.
## Saving DGE results ==> ./Results/Seurat/DGE.results_dx_cluster0.tsv
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Cluster 1- PD vs. control

An object of class Seurat 14827 features across 2432 samples within 1 assay Active assay: RNA (14827 features) 3 dimensional reductions calculated: pca, tsne, umap

## Assuming data assay in position 1, with name et is log-transformed.
## 
## Done!
## Combining coefficients and standard errors
## Warning in melt(coefAndCI, as.is = TRUE): The melt generic in data.table has
## been passed a array and will attempt to redirect to the relevant reshape2
## method; please note that reshape2 is deprecated, and this redirection is now
## deprecated as well. To continue using melt methods from reshape2 while both
## libraries are attached, e.g. melt.list, you can prepend the namespace like
## reshape2::melt(coefAndCI). In the next version, this warning will become an
## error.
## Calculating log-fold changes
## Warning in melt(lfc): The melt generic in data.table has been passed a list
## and will attempt to redirect to the relevant reshape2 method; please note that
## reshape2 is deprecated, and this redirection is now deprecated as well. To
## continue using melt methods from reshape2 while both libraries are attached,
## e.g. melt.list, you can prepend the namespace like reshape2::melt(lfc). In the
## next version, this warning will become an error.
## Calculating likelihood ratio tests
## Refitting on reduced model...
## 
## Done!
## Warning in melt(llrt): The melt generic in data.table has been passed a list
## and will attempt to redirect to the relevant reshape2 method; please note that
## reshape2 is deprecated, and this redirection is now deprecated as well. To
## continue using melt methods from reshape2 while both libraries are attached,
## e.g. melt.list, you can prepend the namespace like reshape2::melt(llrt). In the
## next version, this warning will become an error.
## Saving DGE results ==> ./Results/Seurat/DGE.results_dx_cluster1.tsv
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Cluster 2- PD vs. control

An object of class Seurat 14827 features across 741 samples within 1 assay Active assay: RNA (14827 features) 3 dimensional reductions calculated: pca, tsne, umap

## Assuming data assay in position 1, with name et is log-transformed.
## 
## Done!
## Combining coefficients and standard errors
## Warning in melt(coefAndCI, as.is = TRUE): The melt generic in data.table has
## been passed a array and will attempt to redirect to the relevant reshape2
## method; please note that reshape2 is deprecated, and this redirection is now
## deprecated as well. To continue using melt methods from reshape2 while both
## libraries are attached, e.g. melt.list, you can prepend the namespace like
## reshape2::melt(coefAndCI). In the next version, this warning will become an
## error.
## Calculating log-fold changes
## Warning in melt(lfc): The melt generic in data.table has been passed a list
## and will attempt to redirect to the relevant reshape2 method; please note that
## reshape2 is deprecated, and this redirection is now deprecated as well. To
## continue using melt methods from reshape2 while both libraries are attached,
## e.g. melt.list, you can prepend the namespace like reshape2::melt(lfc). In the
## next version, this warning will become an error.
## Calculating likelihood ratio tests
## Refitting on reduced model...
## 
## Done!
## Warning in melt(llrt): The melt generic in data.table has been passed a list
## and will attempt to redirect to the relevant reshape2 method; please note that
## reshape2 is deprecated, and this redirection is now deprecated as well. To
## continue using melt methods from reshape2 while both libraries are attached,
## e.g. melt.list, you can prepend the namespace like reshape2::melt(llrt). In the
## next version, this warning will become an error.
## Saving DGE results ==> ./Results/Seurat/DGE.results_dx_cluster2.tsv
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Between Mutation Groups

Cluster 0- LRRK2 vs. PD

An object of class Seurat 14827 features across 15971 samples within 1 assay Active assay: RNA (14827 features) 3 dimensional reductions calculated: pca, tsne, umap

## Assuming data assay in position 1, with name et is log-transformed.
## 
## Done!
## Combining coefficients and standard errors
## Calculating log-fold changes
## Calculating likelihood ratio tests
## Refitting on reduced model...
## 
## Done!
## Saving DGE results ==> ./Results/Seurat/DGE.results_mut_cluster0.tsv
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Cluster 1- LRRK2 vs. PD

An object of class Seurat 14827 features across 2432 samples within 1 assay Active assay: RNA (14827 features) 3 dimensional reductions calculated: pca, tsne, umap

## Assuming data assay in position 1, with name et is log-transformed.
## 
## Done!
## Combining coefficients and standard errors
## Calculating log-fold changes
## Calculating likelihood ratio tests
## Refitting on reduced model...
## 
## Done!
## Saving DGE results ==> ./Results/Seurat/DGE.results_mut_cluster1.tsv
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Cluster 2- LRRK2 vs. PD

An object of class Seurat 14827 features across 741 samples within 1 assay Active assay: RNA (14827 features) 3 dimensional reductions calculated: pca, tsne, umap

## Assuming data assay in position 1, with name et is log-transformed.
## 
## Done!
## Combining coefficients and standard errors
## Calculating log-fold changes
## Calculating likelihood ratio tests
## Refitting on reduced model...
## 
## Done!
## Saving DGE results ==> ./Results/Seurat/DGE.results_mut_cluster2.tsv
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Enrichment

## Welcome to enrichR
## Checking connection ... Connection is Live!

Enrichr on Clusters

Cluster 0

Uploading data to Enrichr… Done. Querying KEGG_2018… Done. Querying Reactome_2016… Done. Querying GO_Biological_Process_2018… Done. Querying GO_Molecular_Function_2018… Done. Querying GO_Cellular_Component_2018… Done. Querying Rare_Diseases_AutoRIF_ARCHS4_Predictions… Done. Querying Human_Gene_Atlas… Done. Parsing results… Done.

KEGG_2018

Reactome_2016

GO_Biological_Process_2018

GO_Molecular_Function_2018

GO_Cellular_Component_2018

Rare_Diseases_AutoRIF_ARCHS4_Predictions

Human_Gene_Atlas

Cluster 1

Uploading data to Enrichr… Done. Querying KEGG_2018… Done. Querying Reactome_2016… Done. Querying GO_Biological_Process_2018… Done. Querying GO_Molecular_Function_2018… Done. Querying GO_Cellular_Component_2018… Done. Querying Rare_Diseases_AutoRIF_ARCHS4_Predictions… Done. Querying Human_Gene_Atlas… Done. Parsing results… Done.

KEGG_2018

Reactome_2016

GO_Biological_Process_2018

GO_Molecular_Function_2018

GO_Cellular_Component_2018

Rare_Diseases_AutoRIF_ARCHS4_Predictions

Human_Gene_Atlas

Cluster 2

Uploading data to Enrichr… Done. Querying KEGG_2018… Done. Querying Reactome_2016… Done. Querying GO_Biological_Process_2018… Done. Querying GO_Molecular_Function_2018… Done. Querying GO_Cellular_Component_2018… Done. Querying Rare_Diseases_AutoRIF_ARCHS4_Predictions… Done. Querying Human_Gene_Atlas… Done. Parsing results… Done.

KEGG_2018

Reactome_2016

GO_Biological_Process_2018

GO_Molecular_Function_2018

GO_Cellular_Component_2018

Rare_Diseases_AutoRIF_ARCHS4_Predictions

Human_Gene_Atlas

WGCNA Eigengenes

Determine whether each of the clusters in scRNA-seq data are enriched for WGCNA eigengenes (a weighted vector of all genes representing each co-expression module).

https://ucdavis-bioinformatics-training.github.io/2017_2018-single-cell-RNA-sequencing-Workshop-UCD_UCB_UCSF/day3/scRNA_Workshop-PART6.html